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ABSTRACT 

The total estimated energy bill for data centers in 2010 was 
$11.5 billion, and experts estimate that the energy cost of a 
typical data center doubles every five years. On the other 
hand, computational developments have started to lag be¬ 
hind storage advancements, therein becoming a future bot¬ 
tleneck for the ongoing data growth which already approaches 
Exascale levels. We investigate the relationship among data 
throughput and energy footprint on a large storage cluster, 
with the goal of formalizing it as a metric that reflects the 
trading among consistency and energy. Employing a client¬ 
centric consistency approach, and while honouring ACID 
properties of the chosen columnar store for the case study, 
Apache HBase, we present the factors involved in the energy 
consumption of the system as well as lessons learned to un¬ 
derpin further design of energy-efficient cluster scale storage 
systems. We only show most releavant preliminary results. 
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H. 3.4 [Systems and Software]: Performance evaluation— 
efficiency and effectiveness 
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I. INTRODUCTION 

As Big Data approaches Exascale levels, storage systems 
start to experiment new challenges in regards to the volume, 
variety and speed at which information needs to be processed 
(velocity). Energy costs are a raising concern, as servers 
are not designed to be power-proportional [2] and modern 
networks eventually may not be able to cope with an already 
oversubscribed model beyond access routers. In particular, 
the power dis-proportionality of storage systems is usually 
due to the heterogeneous consumption of disks as well as 
memory instability [3]. 

In database applications, [5] explored the energy-efficiency 
of databases, and found they are not able to measure notice¬ 
able variations in power consumption using different work¬ 
loads when varied the amount of memory accessed and the 
access patterns applied (sequential vs random memory ac¬ 
cesses) . 

On the other hand, and while shared-nothing architectures 
allow decoupling of the underlying hardware infrastructure 
from the computation, it is yet still not fully understood how 


to efficiently adapt distributed storage systems that sustain 
high throughput to mission critical application’s and reduce 
overall energy bills. 

2. EXPERIMENTAL STUDY 

We study HBase, a column-oriented data store which fol¬ 
lows the architectural design of BigTable, and it is suited for 
random, real-time read/write access to Big Data. Our find¬ 
ings show potential for improvement in this research area. 
To the best of our knowledge, this is the first study to date 
which focuses on the energy footprint of random read and 
write workloads in a modern NoSQL data store (e.g., Apache 
HBase). To this end, we show the impact different workload 
types, consistency and concurrency levels have over the total 
energy consumption of the storage cluster as well as its data 
throughput. Empirical results are obtained through auto¬ 
mated and reproducible experiments developed for running 
an HBase cluster of machines on the GridSOOO [I] platform. 

2.1 Approach 

Our methodology follows a client-centric consistency model 
with two configurations. Deferred-updates, with a buffer of 
size 12MB (default in HBase), namely eventual; or without 
buffer, namely strong. Both leverage the default Hadoop 
packet size of 64KB (which in turn involves no buffer-copy). 
Naturally, HBase provides strong consistency semantics at 
the row level and within a data center. Therefore, for ana¬ 
lyzing the effect of deferring or not updates under a strongly 
consistent architecture, we embed these semantics into the 
HBase client of YCSB {Yahoo Cloud Service Benchmark). 
At the time of running the experiments we used a stable 
version of HBase (hbase-0.94.8) in a cluster of 40 server ma¬ 
chines of the model Carri System CS-5393B with Intel Xeon 
X3440 CPU at 2.53 Ghz, 16 GB memory, 320GB / SATA II 
(drive ahci) of storage and Gigabit Ethernet network connec¬ 
tivity. We experiment with 3 large workloads that fully stress 
memory and therefore exercise hard-disks. Namely, write in¬ 
tensive (80% writes) as in e-Commerce applications during 
peak season due to flash-crowds {e.g., Black Friday or Cyber 
Monday), read intensive (80% reads) which is the usual pat¬ 
tern in HBase with the messages application at Eacebook, 
and balanced in order to see the effect of a mixed workload 
(50%-50%). All of them use a uniform data distribution in 
order to simulate random reads/writes to HBase, meaning 
choosing an item uniformly at random. Energy measure¬ 
ments are obtained through an API connected to the power 
distribution units in the data center. 


2.2 Modeling the trade-off 

With energy efficiency generally described as 
we realize the impact of deferred-updates (as in eventually 
consistent systems) as the fraction of throughput produced 
with a given amount of energy to be consumed by the clus¬ 
ter. The hypothesis is that consistency guarantees (latency 
of update propagation time) are offered to the client in ex¬ 
change of a given energy footprint on the cluster. A simple 
model can introduce a good estimation of this trade-off by 
considering the different factors leading to a final amount 
of energy consumed; conveyed as the amount of consistency 
offered with a given energy budget in fact, ■ 


3. ANALYSIS AND CONCLUSIONS 

In this paper we analyze and characterize the energy ef¬ 
ficiency of three different workloads, which exhibit different 
behaviors in terms of energy consumption and data through¬ 
put. While strong delivers poorer performance and often 
consumes same or more energy than eventual (as in the case 
of a write intensive workload under high concurrency), there 
is a substantial improvement in throughput when using even¬ 
tual in all cases. 

The most interesting case is the write workload. Eventual 
(with buffer) achieves around 3x times higher throughput 
under high concurrency and averages about 0,01 Kilowatt- 
hour (kW*h) less than the strong approach (without buffer). 
Those savings increase as the number of concurrent clients 
grow because of the steady consumption with strong, unlike 
the case of eventual. The case of reads is more surprising, 
which reveals that in systems such as HBase, built on top of 
a memory store, reads cost more energy per unit of through¬ 
put. The balanced workload follows the same trend as well, 
indicating the clear impact of reads once again. 

Therefore, access patterns, concurrency and consistency 
leads to a given energy consumption for each type of work¬ 
load. This is highlighted as the relationship among Energy 


and Throughput in a modern data store that is built to scale 
with random reads and writes. 

In turn, it must be possible to reach further energy sav¬ 
ings by applying Write off-loading techniques on HBase idle 
region servers pointing to a distributed and common file sys¬ 
tem (HDFS). That is, changing requests patterns by caching 
or moving such requests from the unused disks into another 
location in the data center, and therefore expecting to in¬ 
crease energy savings substantially as in [4]. 
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Figure 1: Energy Vs Speedup 










